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The invention relates to coding of audio signals, in which transient signal 
components are coded. 

The invention further relates to decoding of audio signals. 

The invention also relates to an audio coder, an audio player, an audio system, 
an audio stream and a storage mediimi. 

The article from Pumhagen and Edler, "Objektbasierter Analyse/Synthese 
Audio Coder fur sehr niedrige Datenraten", ITG Fachbericht 1998^ No. 146, pp. 35-40 
discloses a device for coding of audio signals at low bit-rates. A model-based Analysis- 
Synthesis arrangement is used, in which an input signal is divided in three parts: single 
sinusoids, harmonic tones, and noise. The input signal is further divided in fixed frames of 32 
ms. For all blocks and signal parts, parameters are derived based on a source-model. To 
improve the representation of transient signal parts, an envelope function a{t) is derived from 
the input signal and applied on selected sinusoids.. The envelope function consists of two line 
segments determined by the parameters ratk? ^dec? ^max as shown in Fig. 1 . 

An object of the invention is to provide audio coding that is advantageous in 
terms of bit-rate and perception. To this end, the invention provides a method of coding and 
decoding, an audio coder, an audio player, an audio system, an audio stream and a storage 
medium as defined in the independent claims. Advantageous embodiments are defined in the 
dependent claims. 

A first embodiment of the invention comprises estimating a position of a 
transient signal component in the audio signal, matching a shape function on the transient 
signal component in case the transient signal component is gradually declining after an initial 
increase, which shape function has a substantially exponential initial behavior and a 
substantially logarithmic declining behavior; and including the position and parameters 
describing the shape function in an audio stream. Such a function has an initial behavior 
substantially according Xo t" and a declining behavior after the initial increase substantially 
according to e~°^ where / is a time, and n and a are parameters which describe a form of the 
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shape function. The invention is based on the insight that such a function gives a better 
representation of transient signal components while the function may be described by a small 
number of parameters, which is advantageous in terms of bit-rate and perceptual quality. The 
invention is especially advantageous in embodiments where transient signal components are 
separately encoded from a sustained signal component, because especially in these 
embodiments a good representation of the transient signal components is important. 

According to a further aspect of the invention, the shape function is a Laguerre 
function, which is in continuous time given by 

c./"e-"' (1) 
where c is a scaling parameter (which may be taken one). In a practical embodiment, a time- 
discrete Laguerre function is used. 

Transient signal components are conceivable as a sudden change in power (or 
amplitude) level or as a sudden change in waveform pattem. Detection of transient signal 
components as such, is known in the art. For example, in J. Kliewer and A. Mertins, *Audio 
subband coding with improved representation of transient signal segments', Proc, of 
EUSIPCO-98^ Signal Processing IX, Theories and applications, Rhodos, Greece, Sept. 1998, 
pp. 2345-2348, a transient detection mechanism is proposed, that is based on the difference in 
energy levels before and after an attack start position. In a practical embodiment according to 
the invention, sudden changes in amplitude level are considered. 

In a preferred embodiment of the invention, the shape function is a generalized 
discrete Laguerre function. Meixner and Meixner-like functions are practical in use and give 
a surprisingly good result. Such functions are discussed in A.C. den Brinker, 'Meixner-like 
functions having a rational z-transform'. Int. J. Circuit Theory Appl.^ 23, 1995, pp. 237-246. 
Parameters of these shape functions are derived in a simple way. 

In another embodiment of the invention, the shape parameters include a step 
indication in case the transient signal component is a step-like change in amplitude. The 
signal after the step-like change is advantageously coded in sustained coders. 

In another preferred embodiment of the invention, the position of the transient 
signal component is a start position. It is convenient to give the start position of the transient 
signal component for adaptive framing, wherein a frame starts at the start position of a 
transient signal component. The start position is used both for the shape function and the 
adaptive framing, which results in efficient coding. If the start position is given, it is not 
necessary to determine the start position by combining two parameters as would be necessary 
in the embodiment described by Edler. 
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The aforementioned and other aspects of the invention will be apparent from 
and elucidated with reference to the embodiments described hereinafter. 

In the drawings: 

Fig. 1 shows a known envelope function, as already discussed; 
Fig. 2 shows an embodiment of an audio coder according to the invention; 
Fig. 3 shows an example of a shape function according to the invention; 
Fig. 4 shows a diagram of first and second order running central moments of 
an input audio signal; 

Fig. 5 shows an example of a shape function derived for an input audio signal; 
Fig. 6 shows an embodiment of an audio player according to the invention; 

and 

Fig. 7 shows a system comprising an audio coder and an audio player; 

The drawings only show those elements that are necessary to understand the 

invention. 

Fig. 2 shows an audio coder 1 according to the invention, comprising an input 
unit 10 for obtaining an input audio signal x(t). The audio coder 1 separates the input signal 
into three components: transient signal components, sustained deterministic components, and 
sustained stochastic components. The audio coder 1 comprises a transient coder 1 1, a 
sinusoidal coder 13 and a noise coder 14. The audio coder optionally comprises a gain 
compression mechanism (GC) 12. 

In this advantageous embodiment of the invention, transient coding is 
performed before sustained coding. This is advantageous because transient signal 
components are not efficiently and optimally coded in sustained coders. If sustained coders 
are used to code transient signal components, a lot of coding effort is necessary, e.g. one can 
imagine that it is difficult to code a transient signal component with only sustained sinusoids. 
Therefore, the removal of transient signal components from the audio signal to be coded 
before sustained coding is advantageous. A transient start position derived in the transient 
coder is used in the sustained coders for adaptive segmentation (adaptive framing) which 
results in a further improvement of performance of the sustained coding. 

The transient coder 1 1 comprises a transient detector (TD) 1 10, a transient 
analyzer (TA) 1 1 1 and a transient synthesizer (TS) 1 12. First, the signal x(/) enters, the 
transient detector 110. This detector 110 estimates if there is a transient signal component. 
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and at which position. This information is fed to the transient analyzer 111. This information 
may also be used in the sinusoidal coder 13 and the noise coder 14 to obtain advantageous 
signal-induced segmentation. If the position of the transient signal component is determined, 
the transient analyzer 111 tries to extract (the main part of) the transient signal component. It 
matches a shape function to a signal segment preferably starting at an estimated start 
position, and determines content underneath the shape function, e.g. a (small) number of 
sinusoidal components. This information is contained in the transient code Cj. The transient 
code Cj is furnished to the transient synthesizer 112. The synthesized transient signal 
component is subtracted from the input signal x(t) in subtracter 16, resulting in a signal xi. In 
case, the GC 12 is omitted, jci = X2> The signal X2 is furnished to the sinusoidal coder 13 where 
it is analyzed in a sinusoidal analyzer (SA) 130, which determines the (deterministic) 
sinusoidal components. This information is contained in the sinusoidal code Cs. From the 
sinusoidal code Cs, the sinusoidal signal component is reconstructed by a sinusoidal 
synthesizer (SS) 131. This signal is subtracted in subtracter 17 from the input X2 to the 
sinusoidal coder 13, resulting in a remaining signal xs devoid of (large) transient signal 
components and (main) deterministic sinusoidal components. Therefore, the remaining signal 
X3 is assumed to mainly consist of noise. It is analyzed for its power content according to an 
ERB scale in a noise analyzer (NA) 14. The noise analyzer 14 produces a noise code Cn- 
Similar to the situation in the sinusoidal coder 13, the noise analyzer 14 may also use the start 
position of the transients signal component as a position for starting a new analysis block. 
The segment sizes of the sinusoidal analyzer 130 and the noise analyzer 14 are not 
necessarily equal. In a multiplexer 15, an audio stream AS is constituted which includes the 
codes Ct, Cs and Cn. The audio stream AS is fumished to e.g. a data bus, an antenna system, 
a storage medium etc. 

In the following, a representation of transient signal components according to 
the invention will be discussed. In this embodiment, the code for transient components Ct 
consists of either a parametric shape plus the additional main frequency components (or other 
content) underneath the shape or a code for identifying a step-like change. According to a 
preferred embodiment of the invention, the shape function for a transient that is gradually 
declining after an initial increase, is preferably a generalized discrete Laguerre function. For 
other types of transient signal components, other functions may be used. 

An example of a generalized discrete Laguerre function, is a Meixner function. 
A discrete zeroth-order Meixner function g(t) is given by: 
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SiO = ^^Q-^'r"r (2) 

where / = 0,1,2,.. and (b)i = is a Pochhammer symbol. The parameter b 

denotes an order of generalization (b > 0) and determines the initial shape of the function: 
approximately / oc /(*-^>^2 for small t. The parameter ^denotes a pole with 0 < ^< 1 and 
determines the decay for larger t. The function g(f) is a positive function for all values of t. 
For 6 = 1 , a discrete Laguerre function is obtained. Furthermore, for 6 = 1 , the z-transform of 
g is a rational function in z and can thus be realized as an impulse response of a first order 
infinite impulse response (IIR) filter. For all other values of b there is no rational z- 

00 

transform. The function g(t) is energy normalized, i.e. ^g^(t) = l. The zeroth-order 

Meixner-function may be created recursively by: 
g(0) = (l-^^)*'^ (3) 



g(l) = ^^±Lj,^g^t-l) forf>0 (4) 

In another embodiment according to the invention, Meixner-like functions are 
used, because they have a rational z-transform. An example of a Meixner-like function is 
shown in Fig. 3. A discrete zeroth-order Meixner-like function h(t) is given by its z- 
transform: 



z 



where a = 0,1,2,.., and Ca is given by: 



(5) 



a 



'2n 



:2 

J 



where Pa is an oth order Legendre polynomial, given by: 



The parameter a denotes the order of generalization (a is a non-negative integer) and ^ is the 
pole with 0 < ^ < 1 . The parameter a determines the initial shape of the function: / oc r " for 
small t. The parameter ^determines the decay for large t. The function /z is a positive 
function for all values of t and is energy normalized. For all values of a, the function h has a 
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rational z-transform and can be realized as the impulse response of an IIR filter (of order 
a+l). 

The flinction h(t) can be expressed in a finite discrete Laguerre-series 

according to: 

KO = t.B.<f>AO (8) 

m=0 

where ^ are discrete Laguerre functions, see the article of A.C. den Brinker. Bm is given by: 



B =C 



(9) 

m 



defined by: 



First and second order running central moments of a given function fft) are 
T,(k)= ^'-^,__^ (10) 



where ko is the start position of the transient signal component. 

With a good estimation of the running moments T\ and T2 of an input audio 
signal (take f(t) = x(t) in equations 10 and 11), the shape parameters may be deduced. 
Unfortunately, in real data a transient signal component is usually followed by a sustained 
excitation phase, disturbing a possible measurement of the running moments. Fig. 4 shows 
the first and second order running central moments of an input audio signal. It appears that 
the running moments initially increase linearly from the assumed starting position and later 
on tend to saturate. Although the shape parameters may be deduced from this curve, because 
the saturation is not as clear as desired for parameter extraction, i.e. it is not clear enough at 
which k good estimates of Tj and T2 are obtained. In an advantageous embodiment of the 
invention, a ratio in initial increase of the running moments Tj and T2 is used to deduct the 
shape parameters. This measurement is advantageous in determining b (and in case of the 
zeroth-order Meixner function a), since b determines the initial behavior of the shape. From a 
ratio between slopes of running moments T\ and T2 a good estimation for b is obtained. From 
simulation results has been obtained that to a very good degree, a linear relation exists 
between the ratio slope Ti/ slope T2 and the parameter 6, which is, in contrast to a Laguerre 
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function, slightly dependent on the decay parameter ^. As a description may be used (derived 
by experiments): 

for Meixner: slope Ti/ slope T2 = b 1/2 (12) 

for Meixner-like: slope Ti/slope T2 = 2a -\- 3/2 (13) 

wherein a dependence is ignored. Because Ti and T2 are zero for k = ko, slope Ti/ slope T2 
may be approximated by Ti/ T2 for a suitable k. 

The pole ^of the shape may be estimated in the following way. A second 
order polynomial is fitted to a running central moment, e.g. T/. This polynomial is fitted to a 
signal segment of T\ with observation time 7 such that leveling off is clearly visible, i.e. a 
clear second order term in the polynomial fit at T, Next, the second-order polynomial is 
extrapolated to its maximimi and this value is assumed to be the saturation level of T\. From 
this value for T\ and 6, ^ is calculated with use of equations 2 and 10, v^th f(t) = g(t). For a 
Meixner-like function, ^is calculated firom the value for T\ and with use of equations 8-10, 
With f(t) = h(t), 

A procedure for estimation of the decay parameter ^is as follows: 
start with some value of T 

fit a second order polynomial to the data on 0 to T, i.e. (t) « Cq -¥c^ t -^-c^t^ for t = [0,7] 
where coj,2 are fitting parameters 

check if the quadratic term of this polynomial is essential ^Xt=T: 

{T) < (1 - s){c^ + c^T) where 8 represents a relative contribution of the quadratic term at ^ = 

r. 

if this is satisfied, then extrapolate T\(t) to its maximum and equate this with T\: 



calculate the decay parameter ^ firom T\ and b (or a) 

For Meixner-like fimctions, the shape parameter o is preferably rounded to integer values. 

Fig. 5 shows an example of a shape fiinction derived for an input audio signal. 

Some pre-processing, like performing a Hilbert transform of the data, may be 
performed in order to get a first approximation of the shape, although pre-processing is not 
essential to the invention. 

When the value at which the running moments saturate is large, i.e. in the 
order of segment/ fi-ame length, the Meixner (-like) shape is discarded. In case the transient is 
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a step-like change in amplitude, the position of the transient is retained for a proper 
segmentation in the sinusoidal coder and the noise code. 

After the start position and the shape of a transient have been determined, the 
signal content underneath the shape is estimated. A (small) number of sinusoids is estimated 
undemeath the shape. This is done in an analysis-by-synthesis procedure as known in the art. 
The data that is used to estimate the sinusoids, is a segment which is windowed in order to 
encompass the transient but not any consequent sustained response. Therefore, a time 
window is applied to the data before entering the analysis-by-synthesis method. In essence, 
the signal which is considered extends from the start position to some sample where the 
shape is reduced to a certain percentage of its maximum. The windowed data may be 
transformed to a frequency domain, e.g. by a Discrete Fourier Transform (DFT). In order to 
avoid low-frequency components, which presumably extend beyond the estimated transient, a 
window in the frequency domain is also applied. Next the maximum response is determined 
and the frequency associated with this maximum response. The estimated shape is modulated 
by this frequency, and the best possible fit is made to the data according to some 
predetermined criterion, e.g. a psycho-acoustic model or in a least-squares sense. This 
estimated transient segment is subtracted from the original transient and the procedure is 
repeated until a maximum number of sinusoidal components is exceeded, or hardly any 
energy is left in the segment. In essence, a transient is represented by a sum of modulated 
Meixner functions. In a practical embodiment, 6 sinusoids are estimated. If the underlying 
content mainly contains noise, a noise estimation is used or arbitrary values are given for the 
frequencies of the sinusoids. 

The transient code Cj includes a start position of a transient and a type of 
transient. The code for a transient in the case of a Meixner (-like) shape includes: 
the start position of the transient 

an indication that the shape is a Meixner (-like) function 
shape parameters b (or a) and ^ 

modulation terms: A^f frequency parameters and amplitudes for (co)sine modulated shape 
In case that the transient is essentially a sudden increase in amplitude level 
where there is no clear decay in this level (relatively) shortly after the starting position, the 
transient cannot be encoded with a Meixner (-like) shape. In that case, the start position is 
retained in order to obtain proper signal segmentation. The code for step-transients includes: 
the start position of the transient 
an indicator for the step 
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The performance of the subsequent sustained coding stages (sinusoidal and 
noise) is improved by using the transient position in the segmentation of the signal. The 
sinusoidal coder and the noise coder start at a new frame at the position of a detected 
transient. In this way, one prevents averaging over signal parts, which are knovra to exhibit 
5 non-stationary behavior. This implies that a segment in front of a transient segment has to be 
shortened, shifted or to be concatenated with a previous frame. 

The audio coder 1 according to the invention optionally comprises a gain- 
control element 12 in front of the sustained coders 13 and 14. It is advantageous for the 
sustained coders, to prevent changes in amplitude level. For a step-transient, this problem is 
10 solved by using a segmentation in accordance with the transients. For transients represented 
with an shape, the problem is partly solved by extracting the transient from the input signal. 
The remnant signal still may include a significant dynamic change in amplitude level, 
f 1 presumably shaped similar to the estimated shape. In order to flatten the remnant signal, the 

gain control element may be used. A compression rate may be defined by: 

y 15 gc(t) = ^~ (12) 

wherein h(t) is the estimated shape and disa parameter describing a compression rate. The 
s gain-control element assumes that after a transient, a stationary phase occurs with amplitude 

excursions amounting to about 0.2 times the maximum in the estimated shape. A ratio r is 
defined by: 

¥^ 0.2M, 

wherein Mr is the maximum of the remnant signal. 

The compression rate parameter d is equal to r if r > 2, otherwise d is taken 0. For the 
compression, only d needs to be transmitted. 

Fig. 6 shows an audio player 3 according to the invention. An audio stream 

25 AS', e.g. generated by an encoder according to Fig. 2, is obtained from a data bus, an antenna 
system, a storage medium etc. The audio stream AS is de-multiplexed in a de-multiplexer 30 
to obtain the codes Cj', Cs' and Cn'. These codes are ftimished to a transient synthesizer 31, 
a sinusoidal synthesizer 32 and a noise synthesizer 33 respectively. From the transient code 
Ct', the transient signal components are calculated in the transient synthesizer 31. In case the 

30 transient code indicates an shape fimction, the shape is calculated based on the received 

parameters. Further, the shape content is calculated based on the frequencies and ampHtudes 
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of the sinusoidal components. If the transient code Ct' indicates a step, then no transient is 
calculated. The total transient signal j^r is a sum of all transients. 

In case the decompression parameter d is used, i.e. if derived in the coder 1 
and included in the audio stream AS\ a decompression mechanism 34 is used. The gain 
signal g{f) is initialized at unity, and the total amplitude decompression factor is calculated as 
the product of all the different decompression factors. In case the transient is a step, no 
amplitude decompression factor is calculated. 

From two subsequent transient positions, a segmentation for the sinusoidal 
synthesis SS 32 and the noise synthesis NS 33 is calculated. The sinusoidal code Cs is used to 
generate signal ys^ described as a sum of sinusoids on a given segment. The noise code Cn is 
used to generate a noise signal Subsequent segments are added by, e.g. an overlap-add 
method. 

The total signal ^t) consists of the sum of the transient signal yj and the 
product of the amplitude decompression g and the sum of the sinusoidal signal ^'s and the 
noise signal jj/m. The audio player comprises two adders 36 and 37 to sum respective signals. 
The total signal is fumished to an output unit 35, which is e.g. a speaker. 

Fig. 7 shows an audio system according to the invention comprising an audio 
coder 1 as showoi in Fig. 2 and an audio player 3 as shown in Fig. 6. Such a system offers 
playing and recording features. The audio stream AS is fumished from the audio coder to the 
audio player over a communication channel 2, which may be a vsdreless connection, a data 
bus or a storage medium. In case the communication channel 2 is a storage medium, the 
storage medium may be fixed in the system or may also be a removable disc, memory stick 
etc. The communication channel 2 may be part of the audio system, but will however often 
be outside the audio system. 

It should be noted that the above-mentioned embodiments illustrate rather than 
limit the invention, and that those skilled in the art will be able to design many alternative 
embodiments without departing from the scope of the appended claims. In the claims, any 
reference signs placed between parentheses shall not be construed as limiting the claim. The 
word 'comprising' does not exclude the presence of other elements or steps than those listed 
in a claim. The invention can be implemented by means of hardware comprising several 
distinct elements, and by means of a suitably programmed computer. In a device claim 
enumerating several means, several of these means can be embodied by one and the same 
item of hardware. The mere fact that certain measures are recited in mutually different 
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dependent claims does not indicate that a combination of these measures cannot be used to 
advantage. 

In summary, the invention provides coding and decoding of an audio signal 
including estimating a position of a transient signal component in the audio signal, matching 
5 a shape function on the transient signal component in case the transient signal component is 
gradually declining after an initial increase, which shape ftmction has a substantially 
exponential initial behavior and a substantially logarithmic declining behavior; and including 
the position and parameters describing the shape function in an audio stream. 



